Intro¶

The GameStop saga that unfolded in early 2021 was a significant event in financial markets, involving unprecedented activity by retail investors that was amplified by social media and trading technology.

This casestudy looks at how a sentiment of bewilderment spread in the wallstreetbets subreddit.

In [1]:
import sys
sys.path.append('../')
import pandas as pd
from src import utils, plot_utils
/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/requests/__init__.py:102: RequestsDependencyWarning: urllib3 (1.26.16) or chardet (5.1.0)/charset_normalizer (2.0.12) doesn't match a supported version!
  warnings.warn("urllib3 ({}) or chardet ({})/charset_normalizer ({}) doesn't match a supported "
[nltk_data] Downloading package punkt to /Users/olgakahn/nltk_data...
[nltk_data]   Package punkt is already up-to-date!

Part 1. Data setup¶

This notebook makes use of the r/WallStreetBets posts dataset. Next, the comments dataset will be used as well.

In [2]:
interim_data_path = '../data/interim/posts_df.csv'

# EITHER:

# 1. Load interim data OR
posts_df = pd.read_csv(interim_data_path, parse_dates=True)

# # 2. Process interim data (takes ages)
# posts_df = utils.get_posts_df('GameStop', 'GME')

# # Save interim data
# posts_df.to_csv(interim_data_path, index=True)

Let's also get the Gamestop price data.

In [3]:
gme_df = utils.get_stock_prices("GME", "2020-12-08", "2021-02-04")
(39, 8)
In [4]:
# Let's calculate a simple rolling moving averages
gme_df['sma20'] = gme_df['Close'].rolling(window=20).mean()
gme_df['sma5'] = gme_df['Close'].rolling(window=5).mean()

In addition to a simple moving averaage, let's also calculate the Exponential Moving Average, as it places a greater weight and significance on the most recent data points. The EMA is calculated by applying a weighting factor which decreases exponentially. This means the EMA includes all the price data within its current value. The most recent price has the highest weight, and each preceding price's weight is reduced by the exponential factor. The major benefit of using an EMA over a simple moving average (SMA) is that it reacts faster to recent price changes. This can make it more effective for short-term trading and for data sets with sudden sharp changes.

In [5]:
gme_df['ema_5'] = gme_df['Close'].ewm(span=5, adjust=False).mean()

Part 2: Looking at keywords expressing bewildernment.¶

If someone says “It doesn’t make sense stock X is going down” that means they have 12 reasons it should be going up, but since it doesn’t, it means there must be something they don’t know. So stock X has a lot going on pulling it in the other direction.

This analysis uses the following keywords as a proxy for the sentiment expressing bewilderment: 'feel', 'believe', 'sense', 'why does', 'how does', 'how come', 'how much', 'when will', 'isn’t it', 'wait for', 'add to', 'should I', 'should we','scale in', 'scale out'. The idea is that when those pick up, we’ll know that something has made a strong move.

This is a beginning of refining the bewilderment metric, as well as the broader study on how an idea or even a felling spreads in the markets.

This sections we will looks at total count and percentage of the bewilderment keywords in our main dataframe of gamestop posts, whether in a post's title or text, so that we can aggregate that either by hour or day, and plot the Gamepost price alongside.

In [14]:
interim_data_path = '../data/interim/hourly_keyword_df.csv'

## EITHER:

# 1. Load interim data OR
hourly_keyword_df = pd.read_csv(interim_data_path, index_col='date', parse_dates=True)

# # 2. Process interim data
# hourly_keyword_df = utils.get_total_keyword_df(posts_df, 'H')

# # Save interim data
# hourly_keyword_df.to_csv(interim_data_path, index=True)

# hourly_keyword_df.shape
In [13]:
interim_data_path = '../data/interim/daily_keyword_df.csv'

## EITHER:

# 1. Load interim data OR
daily_keyword_df = pd.read_csv(interim_data_path, index_col='date', parse_dates=True)

# # 2. Process interim data (takes ages)
# daily_keyword_df = utils.get_total_keyword_df(posts_df,'D')

# # Save interim data
# daily_keyword_df.to_csv(interim_data_path, index=True)

# daily_keyword_df.shape

Plot total keywords counts together with GME price.

In [8]:
plot_utils.plot_total_keywords_counts(hourly_keyword_df, gme_df)

Look at total keyword percentage of all chatter.

This graph also includes annotations of the three major developments of the GameStop saga:

  1. The Short Squeeze: The initial event that kicked off the saga was a massive short squeeze on GameStop's stock. A short squeeze occurs when a heavily shorted stock starts to rise in value, forcing short sellers to buy shares to cover their positions, which in turn pushes the price even higher. In the case of GameStop, an army of retail investors, many of whom were users on the Reddit forum r/WallStreetBets, started buying up GameStop shares, causing the price to skyrocket and triggering the short squeeze.

  2. The Role of Social Media and Retail Trading Platforms: The GameStop event underscored the influence of social media and new trading platforms in the financial markets. Platforms like Robinhood have made it easier for retail investors to trade, and forums like r/WallStreetBets have allowed these investors to share ideas and rally around specific causes, like buying GameStop shares. The event sparked a debate about market manipulation, the role of social media in investing, and the democratization of financial markets.

  3. Trading Restrictions and Subsequent Controversy: In response to the extreme volatility in GameStop's stock, several trading platforms, including Robinhood, temporarily restricted trading in the stock. These restrictions were met with significant backlash from users and even led to a number of lawsuits. Critics argued that the trading platforms were protecting institutional investors at the expense of retail investors. The controversy prompted calls for increased regulation and transparency in the financial industry.

In [9]:
annotations = [
    dict(
        x='2021-01-13',
        y=gme_df.loc[gme_df['Date'] == '2021-01-13', 'High'].values[0],
        xref='x', yref='y2',
        text='Event 1: Initial Short Squeeze',
        showarrow=True,
        arrowhead=1,
        ax=0,
        ay=-50
    ),
    dict(
        x='2021-01-27',
        y=gme_df.loc[gme_df['Date'] == '2021-01-27', 'High'].values[0],
        xref='x', yref='y2',
        text='Event 2: Peak of the Rally',
        showarrow=True,
        arrowhead=4,
        ax=-70,
        ay=-50
    ),
    dict(
        x='2021-01-28',
        y=gme_df.loc[gme_df['Date'] == '2021-01-28', 'High'].values[0],
        xref='x', yref='y2',
        text='Event 3: Trading Restrictions Implemented',
        showarrow=True,
        arrowhead=1,
        ax=50,
        ay=-50
    )
]
In [10]:
plot_utils.plot_total_keyword_percentage(hourly_keyword_df, gme_df, annotations)

Look at percentage of all chatter PER KEYWORD

In [15]:
interim_data_path = '../data/interim/per_keyword_df.csv'

## EITHER:

# 1. Load interim data OR
per_keyword_df = pd.read_csv(interim_data_path, index_col='date', parse_dates=True)

# # 2. Process interim data
# per_keyword_df = utils.get_per_keyword_df(posts_df, 'D')

# # Save interim data
# per_keyword_df.to_csv(interim_data_path, index=True)

# per_keyword_df.shape
In [12]:
plot_utils.plot_per_keyword_percentages(per_keyword_df, gme_df)

Next steps:

  1. get a streaming social media data of traders' forums
  2. using purpose-built NLP techniques to quantifying sentiment
  3. verify correlation with statistical techniques
  4. control for confounding factors (overall market movements, company-specific news/events)
  5. out-of-sample testing (different stock, different social media, different granularity)

July 27, 2020: Roaring Kitty, aka Keith Gill, starts posting YouTube videos on GameStop here, saying he sees value in the heavily shorted stock he had been investing in since

June 2019. Gill also posts on the Reddit forum WallStreetBets under the name DeepF***ingValue, according to an interview in the Wall Street Journal.

Dec 8, 2020: GameStop shares tank after company misses Wall Street estimates for quarterly revenue as pandemic-led store closures and intense competition from digital-game sellers hit sales.

Jan 11: GameStop appoints Chewy.com founder and two other e-commerce veterans to its board in a deal with investor Ryan Cohen’s RC Ventures, as it doubles down on digital sales.

Jan 12: Short interest at 70.9 mln shares, down from 71.2 mln on Jan 8, per S3 Partners. Notional value of short bets rose to $1.4 bln from $1.3 bln, reflecting the rising stock price.

Jan 13: GameStop shares rise 57%, followed by another 27% jump the next day to $39.90. Its median target price among analysts is $12.50.

Jan 19: Short seller Citron Research takes aim. Tweets about GameStop, saying buyers at these levels are “the suckers at this poker game” and stock “back to $20 fast.”

Jan 20: Citron Research delays negative report, says it does not want to go live with its report on the stock.

Jan 22: Shares rise another 50%.

Jan 25: GameStop stock soars as much 144% then settles up 18% with retail traders storming in to buy more.

Jan 25: Hedge fund Melvin Capital Management receives $2.75 billion investment from Citadel, the Chicago-based hedge fund led by Ken Griffin, and billionaire investor Steven A. Cohen’s Point72 Asset Management, after losing on a series of short bets, including on GameStop. Griffin also founded Citadel Securities, one of several market making firms that pays to execute customer orders from Robinhood.

Jan 26: Elon Musk tweets "Gamestonk!!", along with a link to Reddit's Wallstreetbets stock trading discussion group, where supporters refer to the Tesla CEO as "Papa Musk."

Jan 26: Shares surge 92.7%. Top securities regulator in Massachusetts reportedly says trading in GameStop suggests there is something “systemically wrong” with the options trading.

Jan 27: Trading volumes in U.S. cash equities and options hit all-time record levels at 24.5 billion shares traded and 57.1 million contracts traded.

Jan 27: GameStop hits a closing high of $347.51.

Jan 27: Melvin Capital and Citron close the majority of their GameStop position at a loss.

Jan 28: Robinhood, along with several other brokerages, restricts trading in GameStop and a handful of stocks after the regulatory deposit requirements for settling the securities skyrocket. With buying restricted at many brokerages, but selling allowed, the stocks sell off.

Jan 28: The U.S. House Financial Services and Senate Banking committees says to hold a Feb. 18 hearing on the stock market following the trading restrictions.

Jan 29: Robinhood begins easing trading restrictions on stocks caught up in the so-called Reddit rally.

Feb 1: Robinhood raises $2.4 billion in capital after raising $1 billion the previous week.

Feb 2: U.S. Treasury Secretary Janet Yellen calls meeting of top financial regulators to discuss market volatility driven by retail trading in GameStop and other stocks.

Feb 4: Robinhood removes all trading curbs. The following day, GameStop hits a session high of $95 and closes up 19.20% at $63.77.

Feb 9: GameStop trading at around $40 a share.

Feb 18: U.S. House panel holds hearing titled: "Game Stopped? Who Wins and Loses When Short Sellers, Social Media, and Retail Investors Collide here."

In [ ]: